AITopics | candidate policy

1fa6269f58898f0e809575c9a48747ef-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 01:03:24 GMT

data mining, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Robots (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.68)
(2 more...)

Add feedback

Efficient Policy Evaluation Across Multiple Different Experimental Datasets

Neural Information Processing SystemsMar-22-2026, 21:18:44 GMT

Artificial intelligence systems are trained combining various observational and experimental datasets from different source sites, and are increasingly used to reason about the effectiveness of candidate policies. One common assumption in this context is that the data in source and target sites (where the candidate policy is due to be deployed) come from the same distribution. This assumption is often violated in practice, causing challenges for generalization, transportability, or external validity. Despite recent advances for determining the identifiability of the effectiveness of policies in a target domain, there are still challenges for the accurate estimation of effects from finite samples. In this paper, we develop novel graphical criteria and estimators for evaluating the effectiveness of policies (e.g., conditional, stochastic) by combining data from multiple experimental studies. Asymptotic error analysis of our estimators provides fast convergence guarantee. We empirically verified the robustness of estimators through simulations.

artificial intelligence, name change, proceedings, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.79)

Add feedback

Fast Efficient Hyperparameter Tuning for Policy Gradient Methods

Supratik Paul, Vitaly Kurin, Shimon Whiteson

Neural Information Processing SystemsFeb-12-2026, 14:56:41 GMT

Neural Information Processing Systems http://nips.cc/

hoof, hyperparameter, optimization, (11 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
North America > Canada (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.96)

Add feedback

InformationDirectedRewardLearning forReinforcementLearning

Neural Information Processing SystemsFeb-7-2026, 19:03:16 GMT

From such expensive feedback, we aim to learn a model of the reward that allows standard RL algorithms to achieve high expected returnswith as few expert queries as possible.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

Neural Information Processing Systems

Country: Europe > Austria > Vienna (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Fast Efficient Hyperparameter Tuning for Policy Gradient Methods

Supratik Paul, Vitaly Kurin, Shimon Whiteson

Neural Information Processing SystemsOct-3-2025, 00:26:40 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, reinforcement learning, (12 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.96)

Add feedback

6add07cf50424b14fdf649da87843d01-AuthorFeedback.pdf

Neural Information Processing SystemsOct-2-2025, 22:32:53 GMT

artificial intelligence, augmentation, machine learning, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.55)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.40)

Add feedback

Efficient On-Policy Reinforcement Learning via Exploration of Sparse Parameter Space

Zhang, Xinyu, Deb, Aishik, Mueller, Klaus

arXiv.org Artificial IntelligenceOct-1-2025

Policy-gradient methods such as Proximal Policy Optimization (PPO) are typically updated along a single stochastic gradient direction, leaving the rich local structure of the parameter space unexplored. Previous work has shown that the surrogate gradient is often poorly correlated with the true reward landscape. Building on this insight, we visualize the parameter space spanned by policy checkpoints within an iteration and reveal that higher performing solutions often lie in nearby unexplored regions. To exploit this opportunity, we introduce ExploRLer, a pluggable pipeline that seamlessly integrates with on-policy algorithms such as PPO and TRPO, systematically probing the unexplored neighborhoods of surrogate on-policy gradient updates. Without increasing the number of gradient updates, ExploRLer achieves significant improvements over baselines in complex continuous control environments. Our results demonstrate that iteration-level exploration provides a practical and effective way to strengthen on-policy reinforcement learning and offer a fresh perspective on the limitations of the surrogate objective.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2509.25876

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.75)

Add feedback

A Multimodal Stochastic Planning Approach for Navigation and Multi-Robot Coordination

Gonzales, Mark, Oh, Ethan, Moore, Joseph

arXiv.org Artificial IntelligenceSep-24-2025

Personal use of this material is permitted. Abstract-- In this paper, we present a receding-horizon, sampling-based planner capable of reasoning over multimodal policy distributions. By using the cross-entropy method to optimize a multimodal policy under a common cost function, our approach increases robustness against local minima and promotes effective exploration of the solution space. We show that our approach naturally extends to multi-robot collision-free planning, enables agents to share diverse candidate policies to avoid deadlocks, and allows teams to minimize a global objective without incurring the computational complexity of centralized optimization. Numerical simulations demonstrate that employing multiple modes significantly improves success rates in trap environments and in multi-robot collision avoidance. Local minima pose a fundamental challenge for finite-horizon, gradient-based planning approaches.

artificial intelligence, optimization, trajectory, (16 more...)

arXiv.org Artificial Intelligence

2509.19168

Genre: Research Report (0.50)

Industry: Transportation (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.67)
Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.66)

Add feedback

Prescribe-then-Select: Adaptive Policy Selection for Contextual Stochastic Optimization

Iglesias, Caio de Prospero, Carballo, Kimberly Villalobos, Bertsimas, Dimitris

arXiv.org Machine LearningSep-11-2025

We address the problem of policy selection in contextual stochastic optimization (CSO), where covariates are available as contextual information and decisions must satisfy hard feasibility constraints. In many CSO settings, multiple candidate policies--arising from different modeling paradigms--exhibit heterogeneous performance across the covariate space, with no single policy uniformly dominating. We propose Prescribe-then-Select (PS), a modular framework that first constructs a library of feasible candidate policies and then learns a meta-policy to select the best policy for the observed covariates. We implement the meta-policy using ensembles of Optimal Policy Trees trained via cross-validation on the training set, making policy choice entirely data-driven. Across two benchmark CSO problems--single-stage newsvendor and two-stage shipment planning--PS consistently outperforms the best single policy in heterogeneous regimes of the covariate space and converges to the dominant policy when such heterogeneity is absent. All the code to reproduce the results can be found at https://anonymous.4open.science/r/Prescribe-then-Select-TMLR.

candidate policy, covariate space, optimization, (15 more...)

arXiv.org Machine Learning

2509.08194

Country: